Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Improve Dictionary FindEntry CQ #15460

Closed
wants to merge 2 commits into from
Closed

Conversation

benaadams
Copy link
Member

@benaadams benaadams commented Dec 11, 2017

Total bytes of diff: -1249 (-0.04% of base)
    diff is an improvement.

Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes

Top file improvements by size (bytes):
       -1249 : System.Private.CoreLib.dasm (-0.04% of base)

1 total files with size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements by size (bytes):
        -683 : System.Private.CoreLib.dasm - Dictionary`2:FindEntry(ref):int:this (11 methods)
        -235 : System.Private.CoreLib.dasm - Dictionary`2:FindEntry(struct):int:this (7 methods)
        -158 : System.Private.CoreLib.dasm - Dictionary`2:FindEntry(int):int:this (5 methods)
        -124 : System.Private.CoreLib.dasm - Dictionary`2:FindEntry(long):int:this (4 methods)
         -32 : System.Private.CoreLib.dasm - Dictionary`2:FindEntry(short):int:this

6 total methods with size differences (6 improved, 0 regressed), 16739 unchanged.

@benaadams
Copy link
Member Author

PTAL @jkotas

@jkotas
Copy link
Member

jkotas commented Dec 11, 2017

Could you please also do sanity check how the change affects throughput e.g. run Dictionary<int,int>.ContainsKey in a loop for positive and negative case? I would expect it to be same or slightly better - just want to make sure that there is nothing falling through cracks.

@benaadams
Copy link
Member Author

benaadams commented Dec 11, 2017

Added end of function to asm also (post loop)

@benaadams
Copy link
Member Author

Could you please also do sanity check how the change affects throughput

Hmm... will work on it a bit more... https://gist.github.com/benaadams/3e2ba5bbc0f265f7ac2dbd5a7f2b1b67

Before

                       Method |      Mean |     Error |    StdDev |
----------------------------- |----------:|----------:|----------:|
    ContainsKeySimplePositive | 10.775 ns | 0.0223 ns | 0.0208 ns |
    ContainsKeySimpleNegative |  7.391 ns | 0.0180 ns | 0.0159 ns |
 ContainsKeyCollisionPositive | 21.163 ns | 0.0257 ns | 0.0215 ns |
 ContainsKeyCollisionNegative | 20.444 ns | 0.0384 ns | 0.0359 ns |

After

                       Method |      Mean |     Error |    StdDev |
----------------------------- |----------:|----------:|----------:|
    ContainsKeySimplePositive | 11.086 ns | 0.0251 ns | 0.0223 ns |
    ContainsKeySimpleNegative |  7.665 ns | 0.0094 ns | 0.0088 ns |
 ContainsKeyCollisionPositive | 22.763 ns | 0.0222 ns | 0.0173 ns |
 ContainsKeyCollisionNegative | 21.902 ns | 0.0397 ns | 0.0352 ns |

@benaadams
Copy link
Member Author

ref Entry isn't helpful for collisions, so have a faster variant.

Just trying to work on when the loop isn't run (simple versions)

@benaadams
Copy link
Member Author

benaadams commented Dec 11, 2017

Last set of changes d5feaf9 are pretty subtle; but think got there?

Total bytes of diff: -1096 (-0.03% of base)
    diff is an improvement.

Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes

Top file improvements by size (bytes):
       -1096 : System.Private.CoreLib.dasm (-0.03% of base)

1 total files with size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements by size (bytes):
        -683 : System.Private.CoreLib.dasm - Dictionary`2:FindEntry(ref):int:this (11 methods)
        -182 : System.Private.CoreLib.dasm - Dictionary`2:FindEntry(struct):int:this (7 methods)
        -118 : System.Private.CoreLib.dasm - Dictionary`2:FindEntry(int):int:this (5 methods)
         -80 : System.Private.CoreLib.dasm - Dictionary`2:FindEntry(long):int:this (4 methods)
         -22 : System.Private.CoreLib.dasm - Dictionary`2:FindEntry(short):int:this
         -11 : System.Private.CoreLib.dasm - Dictionary`2:FindEntry(char):int:this

6 total methods with size differences (6 improved, 0 regressed), 16739 unchanged.

Before

                       Method |      Mean |     Error |    StdDev |
----------------------------- |----------:|----------:|----------:|
    ContainsKeySimplePositive | 10.817 ns | 0.0294 ns | 0.0275 ns |  92,447,074 per sec
    ContainsKeySimpleNegative |  7.407 ns | 0.0105 ns | 0.0088 ns | 135,007,425 per sec
 ContainsKeyCollisionPositive | 21.208 ns | 0.0282 ns | 0.0263 ns |  47,152,018 per sec
 ContainsKeyCollisionNegative | 20.141 ns | 0.0449 ns | 0.0375 ns |  49,649,967 per sec

After

                       Method |      Mean |     Error |    StdDev |
----------------------------- |----------:|----------:|----------:|
    ContainsKeySimplePositive | 10.822 ns | 0.0166 ns | 0.0147 ns |  92,404,361 per sec
    ContainsKeySimpleNegative |  7.682 ns | 0.0181 ns | 0.0169 ns | 130,174,433 per sec
 ContainsKeyCollisionPositive | 19.956 ns | 0.0474 ns | 0.0421 ns |  50,110,242 per sec
 ContainsKeyCollisionNegative | 18.758 ns | 0.0229 ns | 0.0214 ns |  53,310,587 per sec

@benaadams
Copy link
Member Author

benaadams commented Dec 11, 2017

Before asm

;  V00 this         [V00,T03] ( 27, 43.50)     ref  ->  rsi         this class-hnd
;  V01 arg1         [V01,T14] (  5,  5.50)     ref  ->  rdi         class-hnd
;  V02 loc0         [V02,T09] ( 10, 15.50)     int  ->  rbx        
;  V03 loc1         [V03,T02] ( 24, 62   )     int  ->  r15        
;  V04 tmp0         [V04,T19] (  5,  4.50)    long  ->  rcx        
;  V05 tmp1         [V05,T20] (  3,  3   )     ref  ->  rbx         class-hnd
;  V06 tmp2         [V06,T18] (  6,  5.50)    long  ->  r11        
;* V07 tmp3         [V07    ] (  0,  0   )    long  ->  zero-ref   
;  V08 tmp4         [V08,T12] (  3, 10   )    long  ->  rcx        
;  V09 tmp5         [V09,T11] (  3, 12   )     ref  ->  r13         class-hnd
;  V10 tmp6         [V10,T08] (  4, 16   )     ref  ->  [rsp+0x28]   class-hnd
;  V11 tmp7         [V11,T06] (  6, 22   )    long  ->  r11        
;* V12 tmp8         [V12    ] (  0,  0   )    long  ->  zero-ref   
;  V13 tmp9         [V13,T10] ( 13, 13   )     ref  ->  rcx        
;  V14 tmp10        [V14,T07] ( 18, 18   )     int  ->  rdx        
;  V15 tmp11        [V15,T00] (  9, 72   )     ref  ->  rcx        
;  V16 tmp12        [V16,T05] (  9, 36   )     ref  ->  rcx        
;  V17 tmp13        [V17,T01] (  9, 72   )     ref  ->  rax        
;  V18 OutArgs      [V18    ] (  1,  1   )  lclBlk (32) [rsp+0x00]  
;  V19 cse0         [V19,T15] (  6,  6   )    long  ->  r14        
;  V20 cse1         [V20,T04] ( 12, 42   )    long  ->  r12        
;  V21 cse2         [V21,T13] ( 17,  8.50)     int  ->   r8        
;  V22 cse3         [V22,T17] ( 11,  5.50)     ref  ->  rax        
;  V23 cse4         [V23,T16] (  6,  6   )    long  ->  rbp        
;
; Lcl frame size = 56
G_M16822_IG01:
       push     r15
       push     r14
       push     r13
       push     r12
       push     rdi
       push     rsi
       push     rbp
       push     rbx
       sub      rsp, 56
       mov      qword ptr [rsp+30H], rcx
       mov      rsi, rcx
       mov      rdi, rdx
G_M16822_IG02:
       test     rdi, rdi
       je       G_M16822_IG13
G_M16822_IG03:
       cmp      gword ptr [rsi+8], 0
       je       G_M16822_IG08
       mov      rbx, gword ptr [rsi+24]
       mov      rbp, qword ptr [rsi]
       mov      rcx, rbp
       mov      rdx, qword ptr [rcx+48]
       mov      r14, qword ptr [rdx]
       mov      r11, qword ptr [r14+136]
       test     r11, r11
       jne      SHORT G_M16822_IG04
       lea      rdx, [(reloc)]
       call     CORINFO_HELP_RUNTIMEHANDLE_CLASS
       mov      r11, rax
G_M16822_IG04:
       mov      rcx, rbx
       mov      rdx, rdi
       cmp      dword ptr [rcx], ecx
       call     qword ptr [r11]
       mov      ebx, eax
       and      ebx, 0xD1FFAB1E
       mov      rax, gword ptr [rsi+8]
       mov      rcx, rax
       mov      r8d, dword ptr [rax+8]
       mov      eax, ebx
       cdq      
       idiv     edx:eax, r8d
       cmp      edx, r8d
       jae      G_M16822_IG14                  ; CORINFO_HELP_RNGCHKFAIL
       movsxd   rdx, edx
       mov      r15d, dword ptr [rcx+4*rdx+16]
       test     r15d, r15d
       jl       G_M16822_IG08
G_M16822_IG05:                                 ; Loop start
       mov      rcx, gword ptr [rsi+16]
       cmp      r15d, dword ptr [rcx+8]
       jae      G_M16822_IG14                  ; CORINFO_HELP_RNGCHKFAIL
       movsxd   rdx, r15d
       lea      r12, [rdx+2*rdx]
       cmp      dword ptr [rcx+8*r12+32], ebx
       jne      SHORT G_M16822_IG07
       mov      r13, gword ptr [rsi+24]
       mov      rcx, gword ptr [rsi+16]
       cmp      r15d, dword ptr [rcx+8]
       jae      G_M16822_IG14                  ; CORINFO_HELP_RNGCHKFAIL
       mov      rax, gword ptr [rcx+8*r12+16]
       mov      gword ptr [rsp+28H], rax
       mov      rcx, rbp
       mov      r11, qword ptr [r14+144]
       test     r11, r11
       jne      SHORT G_M16822_IG10
       lea      rdx, [(reloc)]
       call     CORINFO_HELP_RUNTIMEHANDLE_CLASS
       mov      r11, rax
       mov      rax, gword ptr [rsp+28H]
G_M16822_IG06:
       mov      rcx, r13
       mov      rdx, rax
       mov      r8, rdi
       cmp      dword ptr [rcx], ecx
       call     qword ptr [r11]
       test     eax, eax
       jne      SHORT G_M16822_IG11
G_M16822_IG07:
       mov      rax, gword ptr [rsi+16]
       cmp      r15d, dword ptr [rax+8]
       jae      G_M16822_IG14                  ; CORINFO_HELP_RNGCHKFAIL
       mov      r15d, dword ptr [rax+8*r12+36]
       test     r15d, r15d
       jge      G_M16822_IG05                  ; Loop end
G_M16822_IG08:
       mov      eax, -1
G_M16822_IG09:
       add      rsp, 56
       pop      rbx
       pop      rbp
       pop      rsi
       pop      rdi
       pop      r12
       pop      r13
       pop      r14
       pop      r15
       ret      
G_M16822_IG10:
       mov      rax, gword ptr [rsp+28H]
       jmp      SHORT G_M16822_IG06
G_M16822_IG11:
       mov      eax, r15d
G_M16822_IG12:
       add      rsp, 56
       pop      rbx
       pop      rbp
       pop      rsi
       pop      rdi
       pop      r12
       pop      r13
       pop      r14
       pop      r15
       ret      
************** Beginning of cold code **************
G_M16822_IG13:
       mov      ecx, 4
       call     ThrowHelper:ThrowArgumentNullException(int)
       int3     
G_M16822_IG14:
       call     CORINFO_HELP_RNGCHKFAIL
       int3     
; Total bytes of code 356, prolog size 27 for method Dictionary`2:FindEntry(ref):int:this

After asm

;  V00 this         [V00,T10] (  9,  6   )     ref  ->  rsi         this class-hnd
;  V01 arg1         [V01,T11] (  5,  5.50)     ref  ->  rdi         class-hnd
;  V02 loc0         [V02,T14] (  9,  5.50)     ref  ->  rbx         class-hnd
;  V03 loc1         [V03,T00] ( 19, 52   )     int  ->  rbp        
;  V04 loc2         [V04,T18] (  4,  3.50)     ref  ->  r14         class-hnd
;  V05 loc3         [V05,T06] (  8, 18   )     int  ->  r13        
;  V06 loc4         [V06,T01] ( 15, 43   )     ref  ->  rbx         class-hnd
;  V07 tmp0         [V07,T16] (  5,  4.50)    long  ->  rcx        
;  V08 tmp1         [V08,T15] (  6,  5.50)    long  ->  r11        
;* V09 tmp2         [V09    ] (  0,  0   )    long  ->  zero-ref   
;  V10 tmp3         [V10,T08] (  3, 10   )    long  ->  rcx        
;  V11 tmp4         [V11,T05] (  5, 20   )     ref  ->  [rsp+0x28]   class-hnd
;  V12 tmp5         [V12,T04] (  6, 22   )    long  ->  r11        
;* V13 tmp6         [V13    ] (  0,  0   )    long  ->  zero-ref   
;  V14 tmp7         [V14,T09] (  9,  9   )     int  ->  rdx        
;  V15 OutArgs      [V15    ] (  1,  1   )  lclBlk (32) [rsp+0x00]  
;  V16 cse0         [V16,T02] ( 10, 40   )   byref  ->  [rsp+0x20]  
;  V17 cse1         [V17,T12] (  6,  6   )    long  ->  r12        
;  V18 cse2         [V18,T03] ( 10, 32   )    long  ->  rcx        
;  V19 cse3         [V19,T07] (  5, 18   )     int  ->  rcx        
;  V20 cse4         [V20,T13] (  6,  6   )    long  ->  r15        
;  V21 cse5         [V21,T17] (  8,  4   )     int  ->  rcx        
;
; Lcl frame size = 56
G_M16822_IG01:
       push     r15
       push     r14
       push     r13
       push     r12
       push     rdi
       push     rsi
       push     rbp
       push     rbx
       sub      rsp, 56
       mov      qword ptr [rsp+30H], rcx
       mov      rsi, rcx
       mov      rdi, rdx
G_M16822_IG02:
       test     rdi, rdi
       je       G_M16822_IG10
G_M16822_IG03:
       mov      rbx, gword ptr [rsi+8]
       mov      ebp, -1
       test     rbx, rbx
       je       G_M16822_IG08
       mov      r14, gword ptr [rsi+24]
       mov      r15, qword ptr [rsi]
       mov      rcx, r15
       mov      rdx, qword ptr [rcx+48]
       mov      r12, qword ptr [rdx]
       mov      r11, qword ptr [r12+136]
       test     r11, r11
       jne      SHORT G_M16822_IG04
       lea      rdx, [(reloc)]
       call     CORINFO_HELP_RUNTIMEHANDLE_CLASS
       mov      r11, rax
G_M16822_IG04:
       mov      rcx, r14
       mov      rdx, rdi
       cmp      dword ptr [rcx], ecx
       call     qword ptr [r11]
       mov      r13d, eax
       and      r13d, 0xD1FFAB1E
       mov      ecx, dword ptr [rbx+8]
       mov      eax, r13d
       cdq      
       idiv     edx:eax, ecx
       cmp      edx, ecx
       jae      G_M16822_IG11                  ; CORINFO_HELP_RNGCHKFAIL
       movsxd   rcx, edx
       mov      ebp, dword ptr [rbx+4*rcx+16]
       mov      rbx, gword ptr [rsi+16]
       test     ebp, ebp
       jl       SHORT G_M16822_IG08
G_M16822_IG05:                                 ; Loop start
       mov      ecx, dword ptr [rbx+8]
       cmp      ebp, ecx
       jae      G_M16822_IG11                  ; CORINFO_HELP_RNGCHKFAIL
       movsxd   rcx, ebp
       lea      rcx, [rcx+2*rcx]
       lea      rax, bword ptr [rbx+8*rcx+16]
       mov      bword ptr [rsp+20H], rax
       cmp      dword ptr [rax+16], r13d
       jne      SHORT G_M16822_IG07
       mov      r8, gword ptr [rbx+8*rcx+16]
       mov      gword ptr [rsp+28H], r8
       mov      rcx, r15
       mov      r11, qword ptr [r12+144]
       test     r11, r11
       jne      SHORT G_M16822_IG06
       lea      rdx, [(reloc)]
       call     CORINFO_HELP_RUNTIMEHANDLE_CLASS
       mov      r11, rax
G_M16822_IG06:
       mov      rcx, r14
       mov      rdx, rdi
       mov      r8, gword ptr [rsp+28H]
       cmp      dword ptr [rcx], ecx
       call     qword ptr [r11]
       test     eax, eax
       jne      SHORT G_M16822_IG08
G_M16822_IG07:
       mov      rax, bword ptr [rsp+20H]
       mov      ebp, dword ptr [rax+20]
       test     ebp, ebp
       jge      SHORT G_M16822_IG05             ; Loop end
G_M16822_IG08:
       mov      eax, ebp
G_M16822_IG09:
       add      rsp, 56
       pop      rbx
       pop      rbp
       pop      rsi
       pop      rdi
       pop      r12
       pop      r13
       pop      r14
       pop      r15
       ret      
************** Beginning of cold code **************
G_M16822_IG10:
       mov      ecx, 4
       call     ThrowHelper:ThrowArgumentNullException(int)
       int3     
G_M16822_IG11:
       call     CORINFO_HELP_RNGCHKFAIL
       int3     
; Total bytes of code 295, prolog size 27 for method Dictionary`2:FindEntry(ref):int:this

{
if (_entries[i].hashCode == hashCode && _comparer.Equals(_entries[i].key, key)) return i;
if (entries[i].hashCode == hashCode && comparer.Equals(key, entries[i].key))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is likely that there is (fragile) code out there that depends on the exact order of arguments passed into the comparer. Switching the order is going to break it. Could you please keep preserve the argument order?

Copy link
Member Author

@benaadams benaadams Dec 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched, though it increases code size. However, have found a pattern to drop the range check which makes up for it

@benaadams
Copy link
Member Author

benaadams commented Dec 11, 2017

Total bytes of diff: -1249 (-0.04% of base)
    diff is an improvement.

Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes

Top file improvements by size (bytes):
       -1249 : System.Private.CoreLib.dasm (-0.04% of base)

1 total files with size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements by size (bytes):
        -683 : System.Private.CoreLib.dasm - Dictionary`2:FindEntry(ref):int:this (11 methods)
        -235 : System.Private.CoreLib.dasm - Dictionary`2:FindEntry(struct):int:this (7 methods)
        -158 : System.Private.CoreLib.dasm - Dictionary`2:FindEntry(int):int:this (5 methods)
        -124 : System.Private.CoreLib.dasm - Dictionary`2:FindEntry(long):int:this (4 methods)
         -32 : System.Private.CoreLib.dasm - Dictionary`2:FindEntry(short):int:this

6 total methods with size differences (6 improved, 0 regressed), 16739 unchanged.

@benaadams
Copy link
Member Author

benaadams commented Dec 11, 2017

@mikedn moving Length test to first statement of if rather than loop drops the range check; e.g

do // or while(true)
{
    if ((uint)i >= (uint)entries.Length || (entries[i]

@benaadams
Copy link
Member Author

benaadams commented Dec 11, 2017

G_M16822_IG01:
       push     r15
       push     r14
       push     r13
       push     r12
       push     rdi
       push     rsi
       push     rbp
       push     rbx
       sub      rsp, 56
       mov      qword ptr [rsp+30H], rcx
       mov      rsi, rcx
       mov      rdi, rdx
G_M16822_IG02:
       test     rdi, rdi
       je       G_M16822_IG11
G_M16822_IG03:
       mov      rbx, gword ptr [rsi+8]
       mov      ebp, -1
       test     rbx, rbx
       je       G_M16822_IG09
       mov      r14, gword ptr [rsi+24]
       mov      r15, qword ptr [rsi]
       mov      rcx, r15
       mov      rdx, qword ptr [rcx+48]
       mov      r12, qword ptr [rdx]
       mov      r11, qword ptr [r12+136]
       test     r11, r11
       jne      SHORT G_M16822_IG04
       lea      rdx, [(reloc)]
       call     CORINFO_HELP_RUNTIMEHANDLE_CLASS
       mov      r11, rax
G_M16822_IG04:
       mov      rcx, r14
       mov      rdx, rdi
       cmp      dword ptr [rcx], ecx
       call     qword ptr [r11]
       mov      r13d, eax
       and      r13d, 0xD1FFAB1E
       mov      ecx, dword ptr [rbx+8]
       mov      eax, r13d
       cdq      
       idiv     edx:eax, ecx
       cmp      edx, ecx
       jae      G_M16822_IG12                  ; CORINFO_HELP_RNGCHKFAIL
       movsxd   rcx, edx
       mov      ebp, dword ptr [rbx+4*rcx+16]
       mov      rbx, gword ptr [rsi+16]
G_M16822_IG05:                                 ; Loop start
       mov      ecx, dword ptr [rbx+8]
       cmp      ecx, ebp
       jbe      SHORT G_M16822_IG09
       movsxd   rcx, ebp
       lea      rcx, [rcx+2*rcx]
       lea      rax, bword ptr [rbx+8*rcx+16]
       mov      bword ptr [rsp+20H], rax
       cmp      dword ptr [rax+16], r13d
       jne      SHORT G_M16822_IG07
       mov      r8, gword ptr [rbx+8*rcx+16]
       mov      gword ptr [rsp+28H], r8
       mov      rcx, r15
       mov      r11, qword ptr [r12+144]
       test     r11, r11
       jne      SHORT G_M16822_IG08
       lea      rdx, [(reloc)]
       call     CORINFO_HELP_RUNTIMEHANDLE_CLASS
       mov      r11, rax
       mov      r8, gword ptr [rsp+28H]
G_M16822_IG06:
       mov      rcx, r14
       mov      rdx, r8
       mov      r8, rdi
       cmp      dword ptr [rcx], ecx
       call     qword ptr [r11]
       test     eax, eax
       jne      SHORT G_M16822_IG09
G_M16822_IG07:
       mov      rax, bword ptr [rsp+20H]
       mov      ebp, dword ptr [rax+20]
       jmp      SHORT G_M16822_IG05             ; Loop end
G_M16822_IG08:
       mov      r8, gword ptr [rsp+28H]
       jmp      SHORT G_M16822_IG06
G_M16822_IG09:
       mov      eax, ebp
G_M16822_IG10:
       add      rsp, 56
       pop      rbx
       pop      rbp
       pop      rsi
       pop      rdi
       pop      r12
       pop      r13
       pop      r14
       pop      r15
       ret      
************** Beginning of cold code **************
G_M16822_IG11:
       mov      ecx, 4
       call     ThrowHelper:ThrowArgumentNullException(int)
       int3     
G_M16822_IG12:
       call     CORINFO_HELP_RNGCHKFAIL
       int3     
; Total bytes of code 295, prolog size 27 for method Dictionary`2:FindEntry(ref):int:this

@benaadams
Copy link
Member Author

benaadams commented Dec 11, 2017

shorter int Key version

G_M6157_IG01:
       push     r15
       push     r14
       push     r12
       push     rdi
       push     rsi
       push     rbp
       push     rbx
       sub      rsp, 32
       mov      rdi, rcx
       mov      esi, edx
G_M6157_IG02:
       mov      rbx, gword ptr [rdi+8]
       mov      ebp, -1
       test     rbx, rbx
       je       SHORT G_M6157_IG05
       mov      r14, gword ptr [rdi+24]
       mov      rcx, r14
       mov      edx, esi
       lea      r11, [(reloc)]
       cmp      dword ptr [rcx], ecx
       call     qword ptr [r11]IEqualityComparer`1:GetHashCode(int):int:this
       mov      r15d, eax
       and      r15d, 0xD1FFAB1E
       mov      ecx, dword ptr [rbx+8]
       mov      eax, r15d
       cdq      
       idiv     edx:eax, ecx
       cmp      edx, ecx
       jae      SHORT G_M6157_IG07            ; CORINFO_HELP_RNGCHKFAIL
       movsxd   rdx, edx
       mov      ebp, dword ptr [rbx+4*rdx+16]
       mov      rdi, gword ptr [rdi+16]
       mov      ebx, dword ptr [rdi+8]
G_M6157_IG03:                                 ; Loop start
       cmp      ebx, ebp
       jbe      SHORT G_M6157_IG05
       movsxd   rdx, ebp
       mov      r12, rdx
       shl      r12, 4
       cmp      dword ptr [rdi+r12+16], r15d
       jne      SHORT G_M6157_IG04
       mov      edx, dword ptr [rdi+r12+24]
       mov      rcx, r14
       mov      r8d, esi
       lea      r11, [(reloc)]
       cmp      dword ptr [rcx], ecx
       call     qword ptr [r11]IEqualityComparer`1:Equals(int,int):bool:this
       test     eax, eax
       jne      SHORT G_M6157_IG05
G_M6157_IG04:
       mov      ebp, dword ptr [rdi+r12+20]
       jmp      SHORT G_M6157_IG03             ; Loop end
G_M6157_IG05:
       mov      eax, ebp
G_M6157_IG06:
       add      rsp, 32
       pop      rbx
       pop      rbp
       pop      rsi
       pop      rdi
       pop      r12
       pop      r14
       pop      r15
       ret      
G_M6157_IG07:
       call     CORINFO_HELP_RNGCHKFAIL
       int3     
; Total bytes of code 169, prolog size 19 for method Dictionary`2:FindEntry(int):int:this

@benaadams
Copy link
Member Author

Pre

                       Method |      Mean |     Error |    StdDev |
----------------------------- |----------:|----------:|----------:|
    ContainsKeySimplePositive | 10.810 ns | 0.0174 ns | 0.0163 ns |
    ContainsKeySimpleNegative |  7.384 ns | 0.0088 ns | 0.0073 ns |
 ContainsKeyCollisionPositive | 21.159 ns | 0.0345 ns | 0.0323 ns |
 ContainsKeyCollisionNegative | 20.070 ns | 0.0243 ns | 0.0227 ns |

Post

                       Method |      Mean |     Error |    StdDev |
----------------------------- |----------:|----------:|----------:|
    ContainsKeySimplePositive | 11.194 ns | 0.0200 ns | 0.0187 ns |
    ContainsKeySimpleNegative |  7.966 ns | 0.0090 ns | 0.0085 ns |
 ContainsKeyCollisionPositive | 18.188 ns | 0.0152 ns | 0.0142 ns |
 ContainsKeyCollisionNegative | 17.330 ns | 0.0188 ns | 0.0166 ns |

@benaadams
Copy link
Member Author

benaadams commented Dec 11, 2017

While here, thought I'd just checking what the lower bound is for using EqualityComparer<TKey>.Default.Equals for int key

with also having mod Length recognized as bounds safe by switching

i = buckets[hashCode % buckets.Length];

For

i = Unsafe.Add(ref Unsafe.As<byte, int>(ref buckets.GetRawSzArrayData()), hashCode % buckets.Length);
                       Method |      Mean |     Error |    StdDev |
----------------------------- |----------:|----------:|----------:|
    ContainsKeySimplePositive |  5.689 ns | 0.0087 ns | 0.0077 ns | 175,777,817 ops/s
    ContainsKeySimpleNegative |  5.119 ns | 0.0110 ns | 0.0098 ns | 195,350,654 ops/s
 ContainsKeyCollisionPositive | 14.182 ns | 0.0102 ns | 0.0090 ns |  70,511,916 ops/s
 ContainsKeyCollisionNegative | 14.690 ns | 0.0154 ns | 0.0144 ns |  68,073,519 ops/s

@AndyAyersMS
Copy link
Member

The jit ought to be able to recognize on its own that an index of the form (non-negative-value) % array.Length is always within bounds. Likely we don't yet realize that from the mask that the hash code is non-negative and also don't realize that the mod establishes a range for its result.

Can you open an issue for this?

@mikedn
Copy link

mikedn commented Dec 11, 2017

with also having mod Length recognized as bounds safe by switching

Hmm, do I understand correctly from your benchmark results that it's 2x faster if you get rid of the range check?!!

The jit ought to be able to recognize on its own that an index of the form (non-negative-value) % array.Length is always within bounds

We discussed this in the past. Assertion propagation could do this if GT_MOD is present in IR. The trouble is that on ARM morph transforms it in x - (x / y) * y and then it becomes problematic to recognize it.

@benaadams
Copy link
Member Author

benaadams commented Dec 11, 2017

that it's 2x faster if you get rid of the range check?!!

No, most of the speed was from using EqualityComparer<TKey>.Default.Equals and not having any handling for IEqualityComparer<TKey> (for typeof(TKey) == typeof(int) only, else everything breaks)

Just wanted to know if it was worth pursuing the gains

@benaadams
Copy link
Member Author

benaadams commented Dec 11, 2017

0-length arrays, might be a problem? Raised issue https://github.com/dotnet/coreclr/issues/15472

I could add to HashHelpers (in a separate PR)

[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal static int GetValueAtIndexModLength(this int[] array, int index)
{
    // Should be: https://github.com/dotnet/coreclr/issues/15472
    //  return buckets[index % array.Length];
    // however the Jit doesn't recognise "mod Length" as bounds safe, so introduces a range check
    return Unsafe.Add(ref Unsafe.As<byte, int>(ref array.GetRawSzArrayData()), index % array.Length);
}

Seems to work asm wise

@mikedn
Copy link

mikedn commented Dec 11, 2017

I could add to HashHelpers (in a separate PR)

You really should measure the effect of eliminating that range check. I doubt that it's significant enough to warrant such a hack. After all we have a division operation that requires 26 cycles and a compare and never taken branch that take 2 cycles and are possible executed in parallel with other instructions.

@benaadams
Copy link
Member Author

benaadams commented Dec 11, 2017

I doubt that it's significant enough to warrant such a hack.

Takes it back under the before change for very simple ContainsKey #15460 (comment)

Equally switching the params from comparer.Equals(entries[i].key, key) to comparer.Equals(key, entries[i].key) does also; so I think its only a single instruction/cycle worse (~0.5ns); which it then makes back if there are any collisions.

I don't think 0.5ns is hugely significant, but it irks me that its not an improvement across the board. Obviously the interface and idiv are far bigger items 😄

@benaadams
Copy link
Member Author

@dotnet-bot test Windows_NT x64 Checked corefx_baseline
@dotnet-bot test Ubuntu x64 Checked corefx_baseline

@benaadams
Copy link
Member Author

25 min total from scheduled to completion.

Contract failure prevents the test run

MembersMustExist : Member 'System.Buffers.OwnedMemory.Pin()' 
 does not exist in the implementation but it does exist in the contract. 

@benaadams
Copy link
Member Author

@dotnet-bot test Windows_NT x64 Checked corefx_baseline
@dotnet-bot test Ubuntu x64 Checked corefx_baseline

@benaadams
Copy link
Member Author

@dotnet-bot test Windows_NT x64 Checked corefx_baseline
@dotnet-bot test Ubuntu x64 Checked corefx_baseline

@benaadams
Copy link
Member Author

Failure is independent of this change https://github.com/dotnet/coreclr/issues/15537

Assert failure: Thread::IsObjRefValid(&objref) 

@benaadams benaadams mentioned this pull request Dec 15, 2017
@benaadams
Copy link
Member Author

Adding devirtualization (with a switch on _comparer == null) #15419

                           Method |      Mean |     Error |    StdDev |
--------------------------------- |----------:|----------:|----------:|
    ContainsKeySimplePositive_Int |  5.980 ns | 0.0117 ns | 0.0109 ns | x 1.74 devirtualized
    ContainsKeySimpleNegative_Int |  4.846 ns | 0.0073 ns | 0.0069 ns | x 1.52 devirtualized
 ContainsKeyCollisionPositive_Int | 15.208 ns | 0.0289 ns | 0.0271 ns | x 1.40 devirtualized
 ContainsKeyCollisionNegative_Int | 15.247 ns | 0.0330 ns | 0.0292 ns | x 1.31 devirtualized
 ContainsKeySimplePositive_String | 20.636 ns | 0.0324 ns | 0.0270 ns | x 1.03 IEqualityComparer
 ContainsKeySimpleNegative_String | 17.536 ns | 0.0174 ns | 0.0163 ns | x 0.98 IEqualityComparer
   ContainsKeySimplePositive_Type | 16.330 ns | 0.0199 ns | 0.0176 ns | x 1.00 virtual
   ContainsKeySimpleNegative_Type |  7.676 ns | 0.0080 ns | 0.0074 ns | x 1.41 virtual

@benaadams
Copy link
Member Author

Incorporated into #15419 as a commit

@benaadams benaadams closed this Dec 16, 2017
@benaadams benaadams deleted the findentry-cq branch October 14, 2019 00:07
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants